Factor Structure and Psychometric Properties of the Farsi Versions of Empathy and Systemizing Quotient: Short Forms.

Objective: We aimed to examine the validity and reliability of the empathy quotient (EQ) and systemizing quotient (SQ) in a Farsi-speaking population. Method : This study explores the factor structure and psychometric properties of the Farsi translations of the 22-item version of EQ and the 25-item version of SQ among 542 young university students. Results: Applying a cross-validation approach, a 14-item two-factor model and a 15-item four-factor model for the Farsi translations of the short versions of EQ and SQ, respectively, were extracted from the exploratory dataset using exploratory factor analysis (EFA). Confirmatory factor analysis (CFA) on the validation dataset confirmed the factor structures identified by EFA. In addition, acceptable internal consistency and test-retest reliability were demonstrated for the Farsi translations of the 14-item two-factor EQ model and the 15-item four-factor SQ model. Conclusion: The results suggested further evidence in favor of the multi-factorial constructs of the EQ and SQ and validity and reliability of the scales.

The current evidence on the matter of factor structure and association between EQ and SQ is controversial. Some studies have favored Baron-Cohen's suggestion that the EQ and SQ measure unifactorial constructs (11)(12)(13)(14). However, others have suggested that the EQ and SQ are, in fact, multifactorial constructs (10,15). Additionally, the current literature shows constraints on the replicability of the suggested models for the EQ and SQ. The validity of these models has been left uninvestigated (e.g., 10,11), unsupported (e.g., 8,13), or marginally supported (e.g., 9) by independent studies. The association between E and S is marked by disagreements in theory, i.e., whether E/S factor space is single-axis or dual-axis (for more details see: 16) and on an empirical level. Empirically, previous studies investigating normative populations have reported a small to moderate correlation (17), in a positive or negative direction, between the scores of the 2 measures . The disagreements on the factorial construct of the EQ and SQ are suggested to partly stem from cultural differences among the studied samples (18) and the use of paper-and-pencil versus online versions of the questionnaires (10). Besides, we propose that decisions made in most of the previous studies regarding a few important methodological issues in the implementation of factor analysis might have played a role in the observed disagreements. First, the use of "Little Jiffy" procedure (i.e., applying principal components analysis (PCA) with varimax rotation and retaining components based on eigenvalues > 1.0) has been reprimanded repeatedly since such a combination of decisions has been found to be prone to producing invalid or distorted results (19). Surprisingly, 5 of 6 studies that investigated the factor structure of the EQ, used the "Little Jiffy" in whole or in part (12, 13 and 20). This also holds true for one (13) of the 2 studies (along with Samson & Huber, 2010) that explored the factor structure of the SQ. Second, treating ordinal data, such as EQ/SQ data, as continuous can lead to severe bias in parameter estimates, standard errors, or fit indices in structural modeling (21). Such a mistreatment can almost constantly be found throughout the literature on the structural modeling of the EQ/SQ. Taken together, it would come as no surprise if a correctly specified model did not fit data well and as a result a plausible model was rejected or a model was introduced that could not be replicated by subsequent studies (19). To address the above concerns, following solutions were suggested. First, EFA should be used instead of PCA, as latent variables underlying the EQ or SQ are to investigate. Second, oblique rotation seems the prior rotation method as factors underlying the EQ/SQ are expected to correlate (22). Third, the number-of-factor problem can be approached using Structural Equation Methods that can provide goodness-of-fit measures to compare among a number of candidate models when performing EFA. Fourth, to treat ordinal data appropriately, factor analysis should be conducted on polychoric correlation matrices as opposed to Pearson correlation. These decisions direct us towards robust weighted least squares (WLS) estimation methods for the factor analysis. Currently, it is suggested that meanand variance-adjusted WLS (WLSMV) can allow a model to converge even with a small sample size, yield less biased standard errors, and provide more accurate factor loading estimates (21). At last, a cross-validation approach can be employed to demonstrate whether or not a model obtained by EFA is replicable on an independent sample . Research goal 1. -To explore the factor structure of the Farsi version of the EQ-short and the SQ-short. Research goal 2. -To investigate internal consistency and test-retest reliability of the Farsi versions of the EQshort and the SQ-short. Research goal 3. -To test the association between scores of the Farsi version of the EQ-short and SQshort.Research goal 4. -To assess whether women outscore men on the Farsi version of the EQ-short and men outperform women on the Farsi version of the SQshort.

Participants
A total of 542 university students volunteered for the present study. Participants were enrolled from 4 different fields of study: science (physical sciences, mathematics, and engineering), humanities, health, and sports. Prior to enrolling, written informed consent was obtained from all participants. The study protocol was approved by the Ethical Committee of Tehran University of Medical Sciences.

Material
The EQ-Short (22 items) and the SQ-Short (25 items) developed by Wakabayashi et al. (13) were employed in this study. These short forms have shown strong correlations with their original versions; moreover, high reliability, and Cronbach's alpha coefficients of .90 and .89 have been reported for the EQ-Short and SQ-Short, respectively, when examining 1761 students from Cambridge University, UK (13). Items of these questionnaires come in a 4-point rating scale with anchors: (1) totally agree, (2) agree, (3) disagree, and (4) totally disagree. Each item's rating is transformed into either 0, 1, or 2 and then accumulated altogether to calculate scores for the EQ-Short and SQ-Short. A strong empathizing/systemizing response to each item is scored 2 points, a slightly empathizing/systemizing response receives the score of 1, and a response that is neither empathizing nor systemizing receives the score of 0. Thus, the total scores can range from 0 to 44 for the EQ-Short and 0 to 50 for the SQ-Short. Based on the standardized protocols (23,24), the EQ-Short and the SQ-Short were separately translated into Farsi. All items from each questionnaire were translated from English to Farsi by 2 independent Farsi-English bilinguals. Inconsistencies between the 2 translations were spotted, resolved through discussions, and one Farsi translation was made which was back-translated to English by 2 English-Farsi bilinguals. Inconsistencies in the English translations were identified and resolved through comparisons with the original English version of the questionnaires. At the end, an expert panel (the translators, a psychologist, a methodologist, and the authors) was convened so the observed discrepancies during the translation and backward translations were discussed and settled. A preliminary Farsi version of the scales was produced and piloted on 20 university students; and the final Farsi versions of EQ-Short and SQ-short, (EQ-Short-F and SQ-Short-F) were developed according to the findings of the pilot study.

Analysis
Factor analysis was conducted using Mplus 7 for windows. All other statistical analyses were performed by SPSS software Version 16 (SPSS, Inc., Chicago, Illinois). Applying a cross-validation approach, participants were randomly sorted into either an exploratory or a validation dataset with equal sizes (n = 271). EFA was conducted on the exploratory dataset to identify latent variables underlying the EQ-Short-F. CFA was performed on the validation dataset to assess the replicability of the constructs obtained by EFA. Testretest reliability of the questionnaires was examined using a subgroup of 20 participants across the 2 splithalf datasets who completed the questionnaires on 2 occasions (4-6 weeks apart). Factor analysis was performed on the polychoric correlation matrix, generated between all possible pairs of items of each questionnaire. WLSMV was used as estimation method. Oblique rotation was applied when necessary. In the exploratory mode, data were fitted into a number of models differing in their number-of-factor. Number-of-factor ranged from 1 (i.e., a unifactorial model) to a maximum number, which was obtained according to the eigenvalue-greater-than-one rule and scree plot (21). Those models that had root mean square error of approximation (RMSEA) above the accepted threshold (0.05 or 0.06) were selected for further investigation (24). Modifications were performed on the selected models through which those items with unacceptable (< 0.35) or insignificant pattern/structure coefficient or cross-loading were allowed to be excluded from the questionnaire. Data on the remaining items was fitted onto the same model again and RMSEA was obtained. This process was continued until the model of best fit was identified. The model of best fit was the one having not only a simple and stable structure but also intuitive interpretability and reasonable validity and reliability when compared with competing models. Additionally, the number of indicators for each factor was kept equal or greater than the recommended value of 3 (21). To mark a model as the best fit when performing CFA, the following goodness-of-fit indices have to be met: a preferably non-significant chi-square test (χ2), a comparative fit index (CFI), and a Tucker-Lewis index (TLI) close to or greater than 0.95, an RMSEA below 0.05 or 0.06, and a weighted root mean square residual (WRMR) less than 1.0 (25,26).

Results
Their mean age was 22.1 years (SD = 2.3; range = 17-28). Men (mean age, 22.4 years, SD = 2.3) and women constituted 37% and 63% of the sample, respectively. 1 .Factor Analysis 1.1 .EQ-Short-F 1.1.1 . Exploratory Factor Analysis: Univariate score distributions from the exploratory dataset appeared to be moderately asymmetric since the largest skew was -1.46 and the largest kurtosis was 2.18, indicating assumption violation on multivariate normality. Using eigenvalue and scree plot, EFA revealed that 1 or 2 factors should be retained. Therefore, models with a number of factors ranging from 1 to 3 were fitted into the data. The RMSEA yielded results of 0.08, 0.06, and 0.06 for one-, two-, and three-factor models, respectively, with no negative residual variances. According to RMSEA value (≤ 0.06), the two-and three-factor models were suggested to be appropriate . Regarding the two-factor promax-rotated model, factor 1 was composed of 3 social skills items (items 3, 4, and 12) and 3 emotional reactivity items (items 7, 11, 17). Thus, factor 1 was labeled as "social/emotional". Factor 2 consisted of 8 cognitive empathy items (6, 9, 10, 16, 18, 19, 20 and 21), so it was regarded as "cognitive empathy". The rest of items were excluded from subsequent analysis, as they did not load on either of the factors (items 2, 5, 22) or showed cross-loadings (items 1,8,13,14,15). RMSEA value for the remaining 14 items loaded onto 2 factors dropped from 0.06 to 0.05, representing a significant improvement in the model fit.
Regarding the three-factor promax-rotated model, 3 (items 1, 14, and 16) and 6 (items 6,9,18,19,20,21) "cognitive empathy" items constituted factors 1 and 3, respectively. Factor 2 consisted of 3 "social skills" items (items 3, 4, 12) and 3 "emotional reactivity" items (items 7, 11, 17). The remaining items showed unacceptable pattern coefficients (items 2, 5, 22) or cross-loading (items 10 and 13), which were eliminated from the subsequent analyses. RMSEA value for the remaining 17 items loaded onto 3 factors remained equal to 0.06. Due to the new promax-rotated pattern coefficients, items 8 and 15 showed cross-loading and were removed from the analysis. RMSEA for three-factor model with the remaining 15 items decreased from 0.06 to 0.05, showing significant improvement in the fit. However, this model appeared to be problematic in factor interpretability, as it had 2 cognitive empathy factors. Hence, we rejected the model and selected the twofactor model with 14 items as the best fit model ( Table  1). Factors 1 and 2 were significantly intercorrelated at r = 0.28 (Pearson Correlation; Table 1).
1.1.2 . Confirmatory Factor Analysis: Using CFA, adequacy of the 14-item two-factor model was evaluated to ensure the adequacy of competing models: (1) the 22item one-factor model (13) and (2) the 13-item onefactor model (20). The sample's data appeared to be moderately asymmetric, as the largest skew and kurtosis were -1.33 and 1.74, respectively. Significant multivariate skewness and kurtosis indicated violation of multivariate normality assumption. The 14-item twofactor model showed acceptable fit (RMSEA = 0.059, 90% CI = 0.045 -0.073; CFI = 0.96; TLI = 0.96; WRMR = 0.89), except for the significant chi-square test (χ276 = 148.5, p < 0.001). In contrast, the 2 competing models yielded unacceptable fits to the data (see Table  2). 1.1.3 .Internal Consistency and Sex Differences: Table 3 separately summarizes the Cronbach's alpha coefficients and sex differences for the exploratory and validation datasets. All Cronbach's alpha values were acceptable, with the lowest being 0.65 corresponding to the social/emotional factor of the exploratory dataset. This was lower than the acceptable level perhaps due to having only 6 items. Item-total correlations were all above 0.25, except for item 12 in social/emotional subscale, which was 0.22 for the exploratory dataset and .19 for the validation dataset. No significant sex differences were reported although women scored slightly higher than men in all comparisons (Table 3).
1 . Exploratory Factor Analysis: Univariate distribution of the observed variables showed a moderate asymmetry because the largest skew and kurtosis were both -1.10. Both multivariate skewness and kurtosis were statistically significant yielding violation of multivariate normality assumption. Inspecting primary pattern coefficient matrix generated by EFA, revealed several items with cross-loadings, leading us to choose direct quartimin over promax rotation (27). According to eigenvalues and scree plot, either a one-or a four-factor model seemed adequate. Thus, models with number of factors ranging from 1 to 5 were evaluated to identify the best fit model. RMSEA for models with all 25 items of the SQ-Short-F loaded onto one factor was 0.07, two factors 0.07, three factors 0.06, four factors 0.05, and five factors 0.04, with no negative residual variances. Hence, a four-or five-factor solution was suggested to adequately fit the data . Once pattern coefficients of items of the four-factor model were reviewed, 3 items (7, 18, and 23) constrained the model interpretability perhaps due to misleading translations. For instance, item 23 (When I'm in a plane, I do not think about the aerodynamics) appears to point out the structural aspects of a plane in the original language. However, the Farsi translation of this item put emphasis on the flight and navigation rather than the aircraft structure and the motion of air. Similar possible misconceptions were noticed for the other 2 items. Such semantic differences could create difficulties for the structural modeling. Hence, to stick to original meanings of items and to increase the replicability of the results, these 3 items were removed from the analysis. EFA with WLSMV estimator and direct quartimin rotation was then repeated on the remaining 22 SQ-Short-F items loaded onto four factors. Factor 1 consisted of items 2, 4, 6, and 14 that could be regarded as pattern/strategy, except for item 2 (If there was a problem with the electrical wiring in my home, I'd be able to fix it myself.), which is a DIY item. It was also significantly loaded on factor 3 and was thus removed from subsequent analysis. Factor 2 consisted of items 3,8,10,12,13,15,17,19, and 25 that could be conceptualized as "technicity", except for items 10, 12, and 15. Item 10, 12, and 15 seemed to be related to structure, science, and DIY, respectively. Plus, they showed cross-loadings, which led us to eliminate them from the analysis. Factor 3 involved items 9, 11, and 22, which could be best described as topography. Factor 4 involved items 10, 21, and 24 that could be regarded as natural systems. The remaining 3 items (1, 5 and 16) were removed due to insignificant cut-off pattern coefficients. There were 3 more items (8, 9 and 11) that exhibited significant cross-loadings, of which; items 8 and 11 had pattern coefficients low enough to ignore these cross-loadings. Pattern coefficients of item 9 loaded onto factor 2 and 3 were 0.39 and 0.55, respectively. Because with items 11 and 22, a meaningful factor (factor 3) was formed and to stick to the three-indicator-per-factor recommendation (21), this item was retained. RMSEA for the remaining 15 SQshort-F items loaded onto four factors dropped from 0.05 to 0.04, with no negative residual variances, showing significant improvement in the model fit (Table  4). There were significant (at p < 0.05) intercorrelations between factors 1 and 2 (r = 0.27), factors 2 and 3 (r = 0.28), factors 2 and 4 (r = 0.32), and factors 3 and 4 (r = 0.30; see Table 3). Regarding the five-factor model, we could not find a meaningful structure even after modifying the model by removing items with insignificant or low pattern coefficients or cross-loading. Therefore, this model was rejected and the 15-item four-factor model was suggested as the best fit model. 1.2.2 . Confirmatory Factor Analysis: In this phase, adequacy of the 15-item four-factor model and 3 competing models were tested: (1) the 25-item onefactor model (13); (2) the 13-item one-factor model (20); and (3) the 18-item four-factor model (10). Because items 7, 18, and 23 had been removed in the exploratory mode due to misleading Farsi translations, adequacy of the competing models was tested with and without these 3 items. Therefore, it is not surprising if you find 2 sets of goodness-of-fit indices for each competing model in Table 5. The largest skewness and kurtosis values for the current sample were -1.17 and -0.97, respectively, indicating a moderately asymmetric distribution. Multivariate normality assumption was violated due to statistically significant multivariate skewness and kurtosis statistics . Regarding the 15-item four-factor model, the following fit statistics were reported: χ284 = 184.3, p < 0.001; RMSEA = 0.07, 90% CI = 0.05, 0.08; CFI = 0.90; TLI = 0.88; WRMR = 1.01. The modification indices suggested that the model will be further improved if items 3 (I rarely read articles or web pages about new technology.) and 17 (I find it difficult to understand information the bank sends me on different investment and saving systems.) were loaded onto both factors 1 and 2. Goodness-of-fit indices for this model were as follow: χ282 = 152.6, p < 0.001; RMSEA = 0.06, 90%CI = 0.04, 0.07; CFI = 0.93; TLI = 0.91; WRMR = 0.90, except for the chi-square statistic that was statistically significant. All other indices were in favor of a relatively acceptable model. There were significant (p < 0.05) intercorrelations between factors 1 and 2, 1 and 3, and 1 and 4 at r = 0.70, 0.58, and 0.34, respectively, and between factors 2 and 3, and 2 and 4 at r = 0.65 and 0.51, respectively, and between factors 3 and 4 at r = 0.34 ( Table 5). None of the competing models could provide an acceptable fit to the data (see Table 5). The 16-item four-factor model adopted from the 18-item four-factor model of Ling et al. (10), in which the 2 problematic items were excluded, was discarded due to the occurrence of Heywood case .

. Internal Consistency and Sex Differences:
Cronbach's alpha coefficient for the SQ-short-F subscale scores was below 0.7 and it was 0.73 for the total score across the exploratory and validation datasets (Table 3). Given the fact that factors 1, 3, and 4 each had 3 items and factor 2 had 6 items, we interpreted these alpha values as acceptable. Across the 2 subsamples, item-total correlations were above 0.25, except for item 7 from the validation dataset (item-total correlation = 0.18). There were significant sex differences in the total SQ-short-F and subscale scores, except for the natural systems subscale across both halves of the study sample (Table  3).

Discussion
Applying a cross-validation design, the present study investigated the factor structure and psychometric properties of the Farsi versions of the EQ-Short and the SQ-Short in a nonclinical sample of Iranian university students. EFA on the exploratory dataset resulted in a 14-item two-factor model for the EQ-Short-F and a 15item four-factor model for the SQ-short-F. Using the validation dataset, CFA confirmed the factorial validity of the models identified by EFA. Regarding the EQshort-F, factor 1 (social/emotional) involved 3 emotional reactivity items (7,11,17) and 3 social skills items (3,4,12) and factor 2 (cognitive empathy) consisted of 8 items (6,9,10,16,18,19,20,21 Regarding the 15-item four-factor SQ-short-F across the exploratory and validation datasets, significant sex differences were found for the total and pattern/strategy, technicity, and topography subscales, with men scoring higher than women. These differences showed reasonable observed power ranging from 0.50 to 0.75. Unexpectedly, total and subscale scores of the 14-item two-factor EQ-short-F as well as natural systems subscale of the SQ-short-F failed to show significant gender differences. These insignificant comparisons demonstrated low observed powers (ranging from 0.05 to 0.16, Cohen's d) suggesting the lack of evidence in support of the null hypothesis. However, such an interpretation seems to be flawed, as observed power is a function of observed p-value (28). Instead, we proposed that these unexpected findings can be explained by constraints of our study sample: (1) a whole nonclinical sample, (2) volunteer participation, and (3) male to female ratio of 1:2, which although reflects the sex ratio of the population of Iranian university students, restricts the range of expected values. The association between empathizing and systemizing has been open to debate, so as the correlation between the scores of the EQ and SQ (coefficients, -0.28 to 0.22) (16, 17, and 29). This study yielded a weak to moderate positive correlation between the scores of the EQ-short-F and the SQ-short-F over the halves of the study sample. Furthermore, we found that the correlation was stronger for men compared to women, which might suggest that this association is sex-dependent (30). We proposed that factor structures identified for the EQ and SQ by previous investigations were limited in replicability. For instance, the 28-item three-factor version of the EQ developed by Lawrence, et al. (8) failed to be replicated by others (9, 15, and 31). As another example, unifactorial constructs for the SQ suggested by early studies (3,8) were also rejected by some later research (10). The limited replicability of these constructs may come from inappropriate decisions made during the process of factor analysis. Our methodology was adopted to address such improper choices. CFA on the validation dataset could acceptably reproduce the models identified by EFA on the exploratory dataset, supporting the soundness of the proposed methodology.

Limitation
The validation course of action is an ongoing procedure and this study is limited by lacking concurrent validity of EQ or SQ with other related scales.

Conclusion
In conclusion, the present study provided preliminary evidence to support the adequate factorial validity and reliability of the 14-item two-factor EQ-short-F and 15item four-factor SQ-short-F questionnaires. Despite the use of different statistical approaches in the decisions made in the process of factor analysis, relatively comparable results were obtained when comparing the results of the present study with those of previous reports on the factor structure of the EQ and SQ. The present study adds to previous investigations suggesting multifactorial structures for the EQ and SQ.